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Power  Efficient  Computing 


Portable  Devices 

•  battery  powered 

(or  less) 

•  larger  systems 

minimize  battery  size  /  weight 

Get  as  much  computation 
as  possible... 


Custom  Analog  ~  1 000  -  1 0000 
more  efficient  than  Custom  Digital 
(Mead  1990) 

Analog  (VMM):  10MMAC/  pW 
Digital:  4  MMAC  /  mW  (DSP) 


Cortical  Neurons 

•  1000’s  of  inputs, 

•  1000’s  of  channel  populations, 

•  one  output 


Equivalent  computation  ~ 
400MMAC  /  neuron 

(no  learning  /  growth) 


~  roughly  20pW  /  neuron 


Useful  Analog  must  be 
Programmable  /  Configurable 

400MMAC  /  neuron  at  20pW. . . 

digital  is  quite  far  away  (lOOmW) 
analog  VMM  closer  (lOOpW) 
analog  HMM  /  dendrites  get  close. . . 

~  200TMAC 
<500  neurons 

~  40kW  (comp)  with  2000  DSPs 
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Modem  System  Design 
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(1837) 
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Multipliers  and  Adders  Design  at  Basic  Algorithms 
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Results 
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Pitch 

Decompression 


Numb  er  o  f  Harmo  nic 
Peaks  Compression 


Featuies 

Decompression 


Vector-Matrix  Multiplication 
Frequency  Decomposition 
Adaptive  Filters 

When  building  analog  systems,  classifiers  (nn,  gmm,  hmm) 

we  expect  to  build  primitives  at  the  basic  algorithm  level.... 


Analog  =  programmable  and  configurable. 
How  to  get  enough  analog  engineers 

Hierarchy  is  a  key  ingredient  to  the 
success  of  the  digital  circuit,  and,  until 
recently,  one  reason  why  large  analog 
designs  have  been  difficult 
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Levels  of  Energy  Efficiency 


Subthreshold 
Transistor  Operation 


Highest  throughput  / 
amount  of  power 


Programmable  Circuits 
(FG  transistors) 

•  Eliminate  mismatch 

•  Programmability 


_  •  •  ■  _  _ 

□[ 


Analog  Signal  Processing 

•  ~  xlOOO  improvement 
in  power  efficiency 


Configurable  Signal 
Processing 

•  Wide  accessibility 


Moving  analog  approaches  /conceptual  framework  to  a  system  design  approach, 
similar  to  digital’s  system  transformation  in  the  1970’s  /  80’s. 

•  Large  need  for  tools  to  compile  /  program  these  systems. 

•  Link  most  “useful”  at  system  /sig  processing  level 

•  Education  /  training  /  foundational  theory  is  critical  for  designing. 


These  techniques  open  further  opportunities  to  utilize  /  explore 
biologically  inspired  techniques 
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MOS  Transistor  Derivation 


Substrate  p 
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Mismatch  is  significant:  1  OmV  VT  shift 
~  50%  bias  current  variation 
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k  =  0.58680 


I0=  1.2104fA 
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As  devices  shrink, 
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Programmable  Analog  Transistors 


•  Standard  CMOS 

•  Data  retention: 

<  5pV  (0.5pm)  (10  year,  300K) 

•  Apps:  Filters,  Data  converters, 

Regulators,  etc. 

Accuracy  -0.1%  between 
1  OOpA  -  1  pA  ~  1 0e' 

Write  degradation  (lOOpC): 

Vtun  increase  less  than  25% 
Vinj  negligable  change 


Otherwise,  need  a  DAC  at  every  parameter  and/or  memory,  etc. 


(lOOpC  is  >109  complete  FG  rewrite) 
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4  5  Jim 


Industrial  Quality  Programmable  Analog  ICs 


1 1 5um 


In  4 
In_ 


Input  Offset 
Voltage 
Reduced  to 
±25jliV 


V.  Srinivasan,  G.  Serrano, 
J.  Gray,  and  P.  Hasler, 
CICC  2005,  pp.  739-742. 
(Best  paper  CICC  2005) 


Floating-gate  transistors 
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Analog  Signal  Processing  Techniques 

Constant  Q  Fnterbanks  Vector-Matrix  Multiplication 
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CADSP  =  Cooperative  Analog — Digital 
Signal  Processing 

Custom  Analog  ~  1000  -  10000  more 
efficient  than  Custom  Digital  (Mead  1990) 

•  Analog  (VMM):  10MMAC/  pW 

(=  10TMAC/  W) 

•  Digital:  4  MMAC  /  mW  (DSP) 


Computation 

MMAC/pW 

Ratio  to  digital 

LowPowerDSPs 
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Resolution  for  Analog  /  Digital 

Tradeoffs 


Signal-to-Noise  (Bits  of  Resolution) 


Input 


Analog  filter 
bank  (~FFT) 


lObit 


[Kucic,  et.  al.  2001] 


[Vittoz95,  Sarpeskar98] 
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Product  fabrication  coste 


Reconfigurable  Signal  Processing 


(Programmable)  (Fixed  Function) 


FPGAs  -  Large  Configurability 

Power:  Just  MAC  engine 
around  2-10MMAC/mW 
Baseline  static  power  ~  0.5W  to  1  W 
Signal  routing  power  /  memory:  ? 


Innovation  and  Process  Scaling  moves 
solutions  towards  programmability 
and  reconfigurability 


DSPs  -  Low  Power  Processing 

-  cell  phones 

(processing  <  30mW  average) 

-  hearing  aids  (1  mW  levels) 

(AMI  /  DSP  factory) 


Power:  54C  series  -  4MMAC/mW 

Power  does  not  include  comm  off  chip 
(i.e.  accessing  memory) 

Power  =  ViC  Vdd2  f  for  CMOS 

Chip  to  Chip  (lOpF  load  min,  2.5V): 
32uW/Mbit  (dynamic) 


Obtaining  data  for  4MMAC  computation  ~  4mW 
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Moving  towards  Configurable  Analog 


Useful  Analog  must  be  Programmable  /  Configurable 

FPAA  = 

Field  Programmable 
Analog  Arrays 

Can  be  a  prototyping  tool,  1 
early  devices,  or 
final  application 

•RASP  1.x  (2002) 

(T.  Hall,  P.  Hasler,  et.  al,  FPL,  Sept.  2002. ) 


•  RASP  2.x: 


3mm 


Switches  are  not  dead  weight 


RASP  2.5,  2.7:  2004-2007 

(C.  Twigg  &  P.  Hasler,  CICC,  2006) 

-  >50,000  Prog.  Analog  Devices 

-  Used  by  >  100  Eng 
RASP  2.8x:  2008- 

(A.  Basu,  et.  al,  CICC,  2008) 

-  Used  by  >  50  Eng  and  growing 
RASP  2.9x:  2009- 


Custom  versus  FPGAs:  x2-3  speed,  xlO  area,  xlOO  power 
Custom  versus  FPAAs:  <  x2  speed,  <  x2  area,  <  x2  power 
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Next  Questions  on  FPAAs 


FAQ  on  Large-Scale  FPAAs 

Design  time  similar  time  for  FPAA  targeted 
and  custom  ICs 
Size  can  be  similar  to  custom 
(programmable  caps  / 1) 

Noise  levels  are  similar  to  custom  design 
Similar  speed  as  custom  upto  routing  fabric  speed 
(~10-20MHz  in  0.35um  CMOS) 

Power  levels  often  similar  to  custom  solutions 
Techniques  scale  (~  ideal  CMOS  rules)  with  process  shrink 


Node  (nm) 

Prog  #s  (M) 

TMACs 

350 

4.0 

1 

90 

64.0 

64 

45 

256.0 

512 

Extract 
Spice  Netlist— 


Simulate  /  Verify 


|  Compiler  (RASPER) 


•  Neuromorfix  to  commercialize  FPAA 
technology 

Compiled  circuits  include: 

n-th  order  filters  /  filterbanks,  Capacitive  summation  /  differencing, 

Ramp  ADC,  Algorithmic  and  Sigma-Delta  ADCs,  MP3  encoder,  WTA, 

Analog  Distributed  Arithmatic,  HMM  classifiers,  Van-der-pol  Oscillator 
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Rapid  Prototyping  using  FPAAs 
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FPAA  Workshops  (RASP  2.8x) 


LA  Workshop 

USC  Campus.  May  2-7.  2008 


CO  Workshop 
Telluride,  July  2008 


>30  Participants. 

Approved  For  Public  Release, 


>  20  Participants. 

ATL  Workshop.  Oct  2008 

>25  Participants. 

Other  workshops  being  planned: 

Boston,  SF,  Orlando,  DC? 

GT  Neurmorophic  Classs 
(Fall2008,  >20  students) 

Education  /  training  /  foundational 
theory  is  critical  for  designing. 

Distribution  Unlimited 


Simulink  FPAA  Tool 
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Function  Block  Parameters:  VMM  J. 


VMM  (mask)  (link) 

Performs  a  voltage-matrix  multiplication  on  the  input  data.  The  "size"  input  below 
refers  to  N..  where  the  input  matrix  is  N  rows  by  N  columns  and  the  input  vector  is  N 
rows. 
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VMM  MV 
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Petre,  et.  al,  ISCAS  2008] 


Getting  higher  power  efficiency: 
Neuromorphic  Engineering 


400MMAC  /  neuron  at  20pW 
vs.  digital  (lOOmW) 
and  analog  SP  (lOOjaW) 

•  Neuromorphic  processing  =  event-based  processing 
uses  power  only  when  useful  signals  are  present 
(“always  on”  in  sensors  or  further  processing) 


Programmability  and  Configurability  empowers 
neuromorphic  design  towards  useful  applications 
in  a  reasonable  timeframe. 

-  Address  Event  Representation  (AER)  /  FPGAs 

-  FPAAs  /  FG  devices  - 

~  sizes  of  largest  custom  neuro  ICs 


Can  model  pryamidal  cells  in  configurable  fabric  in  ~lmm2  area  with 
realistic  channel,  dendrite,  and  synapse  elements  (power  in  nW  level  and  decreasing) 

Approved  For  Public  Release,  Distribution  Unlimited 


Levels  of  Energy  Efficiency 


Subthreshold 
Transistor  Operation 


Highest  throughput  / 
amount  of  power 


Programmable  Circuits 
(FG  transistors) 

•  Eliminate  mismatch 

•  Programmability 


_  •  •  ■  _  _ 

□[ 


Analog  Signal  Processing 

•  ~  xlOOO  improvement 
in  power  efficiency 


Configurable  Signal 
Processing 

•  Wide  accessibility 


Moving  analog  approaches  /conceptual  framework  to  a  system  design  approach, 
similar  to  digital’s  system  transformation  in  the  1970’s  /  80’s. 

•  Large  need  for  tools  to  compile  /  program  these  systems. 

•  Link  most  “useful”  at  system  /sig  processing  level 

•  Education  /  training  /  foundational  theory  is  critical  for  designing. 

These  techniques  open  further  opportunities  to  utilize  /  explore 
biologically  inspired  techniques 

DARPA  activity: 

ISP,  CT2WS,  SyNAPSE,  Healics,  TEAM 
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